truth maintenance
$\forall$uto$\exists$$\lor\!\land$L: Autonomous Evaluation of LLMs for Truth Maintenance and Reasoning Tasks
Karia, Rushang, Bramblett, Daniel, Dobhal, Daksh, Srivastava, Siddharth
This paper presents $\forall$uto$\exists$$\lor\!\land$L, a novel benchmark for scaling Large Language Model (LLM) assessment in formal tasks with clear notions of correctness, such as truth maintenance in translation and logical reasoning. $\forall$uto$\exists$$\lor\!\land$L is the first benchmarking paradigm that offers several key advantages necessary for scaling objective evaluation of LLMs without human labeling: (a) ability to evaluate LLMs of increasing sophistication by auto-generating tasks at different levels of difficulty; (b) auto-generation of ground truth that eliminates dependence on expensive and time-consuming human annotation; (c) the use of automatically generated, randomized datasets that mitigate the ability of successive LLMs to overfit to static datasets used in many contemporary benchmarks. Empirical analysis shows that an LLM's performance on $\forall$uto$\exists$$\lor\!\land$L is highly indicative of its performance on a diverse array of other benchmarks focusing on translation and reasoning tasks, making it a valuable autonomous evaluation paradigm in settings where hand-curated datasets can be hard to obtain and/or update.
784
'I can't believe that!' said Alice'Can't you?' the Queen said in a pitying tone. 'Try again: draw a long breadth, and shut your eyes.' Alice laughed. 'There's no use trying,' she said: 'one can't believe impossible things.' 'I daresay you haven't had much practice,' said the Queen. 'When I was your age, I always did it for half-an-hour a day. Why, sometimes I've believed as many as six impossible things before breakfast.'
651
The contributions to this workshop indicate substantial advances in the technical foundations of the field. They also show that it is time to evaluate the existing approaches to commonsense reasoning problems. The Second International Workshop on Nonmonotonic Reasoning was held from 12-16 June 1988 in Grassau, a small village near Lake Chiemsee in southern Germany. It was jointly organized by Johan de Kleer, Matthew Ginsberg, Erik Sandewall, and myself. Financial support for the workshop came from the American Association for Artificial Intelligence (AAAI), Deutsche Forschungsgemeinschaft (DFG), The European Communities (Project Cost-13), Linkรถping University, and SIEMENS AG.
The Automatic Training of Rule Bases that Use Numerical Uncertainty Representations
The use of numerical uncertainty representations allows better modeling of some aspects of human evidential reasoning. It also makes knowledge acquisition and system development, test, and modification more difficult. We propose that where possible, the assignment and/or refinement of rule weights should be performed automatically. We present one approach to performing this training - numerical optimization - and report on the results of some preliminary tests in training rule bases. We also show that truth maintenance can be used to make training more efficient and ask some epistemological questions raised by training rule weights.
Second International Workshop on Nonmonotonic Reasoning
It 445 Burgess Drive In spite of the many strong technical was generally agreed that the formalization Menlo Park, CA 94025-3496 results that have been produced, it is of commonsense reasoning (415) 328-3123 still far from clear whether existing should be a top-level item for future approaches are sufficient to formalize research.
An outlook on truth maintenance
Truth maintenance systems have been used in several recent problem solving systems to record justifications for deduced assertions, to track down the assumptions which underlie contradictions when they arise, and to incrementally modify assertional data structures when assumptions are retracted. A TMS algorithm is described here that is substantially different from previous systems. This algorithm performs deduction in traditional propositional logic in such a way that the premise set from which deduction is being done can be easily manipulated. A novel approach is also taken to the role of a TMS in larger deductive systems. In this approach the TMS performs all propositional deduction in a uniform manner while the larger system is responsible for controlling the instantiation of universally quantified formulae and axiom schemas.